Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Imbalanced telecom customer data classification method based on dissimilarity
WANG Lin, GUO Nana
Journal of Computer Applications    2017, 37 (4): 1032-1037.   DOI: 10.11772/j.issn.1001-9081.2017.04.1032
Abstract436)      PDF (964KB)(450)       Save
It is difficult for conventional classification technology to discriminate churn customers in the context of imbalanced telecom customer dataset, therefore, an Improved Dissimilarity-Based imbalanced data Classification (IDBC) algorithm was proposed by introducing an improved prototype selection strategy to Dissimilarity-Based Classification (DBC) algorithm. In prototype selection stage, the improved sample subset optimization method was adopted to select the most valuable prototype set from the whole dataset, thus avoiding the uncertainties caused by the random selection; in classification stage, new feature space was constructed via dissimilarity between samples from train set and prototype set, and samples from test set and prototype set, and then dissimilarity-based datasets mapped into corresponding feature space were learnt with conventional classification algorithms. Finally, the telecom customer dataset and other six ordinary imbalanced datasets from UCI database were selected to test the performance of IDBC. Compared with the traditional imbalanced data classification algorithm based on features, the recognition rate of DBC algorithm for rare class was improved by 8.3% on average, and the recognition rate of IDBC algorithm for raw class was increased by 11.3%. The experimental results show that the IDBC algorithm is not affected by the category distribution, and the discriminative ability of IDBC algorithm outperforms existing state-of-the-art approaches.
Reference | Related Articles | Metrics